Brush-Based Ranking For Navigating Within High-Dimensional Datasets
نویسنده
چکیده
The analysis of high-dimensional data means a big challenge, as most common visualization techniques do not scale well for displaying a large number of attributes at one time. Therefore, the initial questions arising when analyzing a new dataset typically concern the dimensions themselves in order to assess the relevance of various attributes and to identify clusters of similar (i.e., highly correlated) attributes. After considering this first step, entry-related tasks like detecting outliers or clusters of similar entries can be dealt with more efficiently in a second step. In this paper, we describe an approach which guides the user through a high-dimensional dataset by ranking dimensions and pairs of dimensions according to a large number of statistical summaries. The option to restrict the computations to subsets of the data (e.g., interactively defined by brushing a linked view) and to statistically compare various subsets makes this approach even more powerful and widely applicable, as illustrated by means of a biological dataset.
منابع مشابه
SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملFast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1)...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کامل